Multi-modal audio-visual event recognition for football analysis
نویسندگان
چکیده
The recognition of events within multi-modal data is a challenging problem. In this paper we focus on the recognition of events by using both audio and video data. We investigate the use of data fusion techniques in order to recognise these sequences within the framework of Hidden Markov Models (HMM) used to model audio and video data sequences. Specifically we look at the recognition of play and break sequences in football and the segmentation of football games based on these two events. Recognising relatively simple semantic events such as this is an important step towards full automatic indexing of such video material. These experiments were done using approximately 3 hours of data from two games of the Euro96 competition. We propose that modelling the audio and video streams separately for each sequence and fusing the decisions from each stream should yield an accurate and robust method of segmenting multi-modal data.
منابع مشابه
Audio-visual Interaction in Model Adaptation for Multi-modal Speech Recognition
This paper investigates audio-visual interaction, i.e. inter-modal influences, in linear-regressive model adaptation for multi-modal speech recognition. In the multi-modal adaptation, inter-modal information may contribute the performance of speech recognition. Thus the influence and advantage of intermodal elements should be examined. Experiments were conducted to evaluate several transformati...
متن کاملStream weight estimation using higher order statistics in multi-modal speech recognition
In this paper, stream weight optimization for multi-modal speech recognition using audio information and visual information is examined. In a conventional multi-stream Hidden Markov Model (HMM) used in multi-modal speech recognition, a constraint in which the summation of audio and visual weight factors should be one is employed. This means balance between transition and observation probabiliti...
متن کاملA Robust Multi-modal Speech Recognition Method Using Optical-flow Analysis
This paper proposes a new multi-modal speech recognition method using optical-flow analysis, evaluating its robustness to acoustic and visual noises. Optical flow is defined as the distribution of apparent velocities in the movement of brightness patterns in an image. Since the optical flow is computed without extracting speaker’s lip contours and location, robust visual features can be obtaine...
متن کاملAudio-Visual Event Recognition with Graphical Models
In this work, different applications for the automated detection of events have been investigated utilizing audio-visual pattern recognition methods. The recorded data has been taken both from video surveillance or video conferences. Acoustic, visual and semantic features are extracted from the available data and are subsequently analysed with the help of graphical models. These are particularl...
متن کاملOn automatic annotation of meeting databases
In this paper, we present meetings as an application domain for multimedia content analysis. Meeting databases are a rich data source suitable for a variety of audio, visual and multi-modal tasks, including speech recognition, people and action recognition, and information retrieval. We specifically focus on the task of semantic annotation of audio-visual (AV) events, where annotation consists ...
متن کامل